25 research outputs found
Reliable and Real-Time Distributed Abstractions
The celebrated distributed computing approach for building systems and services using multiple machines continues to expand to new domains. Computation devices nowadays have additional sensing and communication capabilities, while becoming, at the same time, cheaper, faster and more pervasive. Consequently, areas like industrial control, smart grids and sensor networks are increasingly using such devices to control and coordinate system operations. However, compared to classic distributed systems, such real-world physical systems have different needs, e.g., real-time and energy efficiency requirements. Moreover, constraints that govern communication are also different. Networks become susceptible to inevitable random losses, especially when utilizing wireless and power line communication. This thesis investigates how to build various fundamental distributed computing abstractions (services) given the limitations, the performance and the application requirements and constraints of real-world control, smart grid and sensor systems. In quest of completeness, we discuss four distributed abstractions starting from the level of network links all the way up to the application level. At the link level, we show how to build an energy-efficient reliable communication service. This is especially important for devices with battery-powered wireless adapters where recharging might be unfeasible. We establish transmission policies that can be used by processes to decide when to transmit over the network in order to avoid losses and minimize re-transmissions. These policies allow messages to be reliably transmitted with minimum transmission energy. One level higher than links is failure detection, a software abstraction that relies on communication for identifying process crashes. We prove impossibility results concerning implementing classic eventual failure detectors in networks with probabilistic losses. We define a new implementable type of failure detectors, which preserves modularity. This means that existing deterministic algorithms using eventual failure detectors can still be used to solve certain distributed problems in lossy networks: we simply replace the existing failure detector with the one we define. Using failure detectors, processes might get information about failures at different times. However, to ensure dependability, environments such as distributed control systems (DCSs), require a membership service where processes agree about failures in real time. We prove that the necessary properties of this membership cannot be implemented deterministically, given probabilistic losses. We propose an algorithm that satisfies these properties, with high probability. We show analytically, as well as experimentally (within an industrial DCS), that our technique significantly enhances the DCS dependability, compared to classic membership services, at low additional cost. Finally, we investigate a real-time shared memory abstraction, which vastly simplifies programming control applications. We study the feasibility of implementing such an abstraction within DCSs, showing the impossibility of this task using traditional algorithms that are built on top of existing software blocks like failure detectors. We propose an approach that circumvents this impossibility by attaching information to the failure detection messages, analyze the performance of our technique and showcase ways of adapting it to various application needs and workloads
You Only Live Multiple Times: A Blackbox Solution for Reusing Crash-Stop Algorithms In Realistic Crash-Recovery Settings
Distributed agreement-based algorithms are often specified in a crash-stop asynchronous model augmented by Chandra and Toueg\u27s unreliable failure detectors. In such models, correct nodes stay up forever, incorrect nodes eventually crash and remain down forever, and failure detectors behave correctly forever eventually, However, in reality, nodes as well as communication links both crash and recover without deterministic guarantees to remain in some state forever.
In this paper, we capture this realistic temporary and probabilitic behaviour in a simple new system model. Moreover, we identify a large algorithm class for which we devise a property-preserving transformation. Using this transformation, many algorithms written for the asynchronous crash-stop model run correctly and unchanged in real systems
RT-ByzCast: Byzantine-Resilient Real-Time Reliable Broadcast
Todayâs cyber-physical systems face various impediments to achieving their intended goals, namely, communication uncertainties and faults, relative to the increased integration of networked and wireless devices, hinder the synchronism needed to meet real-time deadlines. Moreover, being critical, these systems are also exposed to significant security threats. This threat combination increases the risk of physical damage. This paper addresses these problems by studying how to build the first real-time Byzantine reliable broadcast protocol (RTBRB) tolerating network uncertainties, faults, and attacks. Previous literature describes either real-time reliable broadcast protocols, or asynchronous (non real-time) Byzantine ones. We first prove that it is impossible to implement RTBRB using traditional distributed computing paradigms, e.g., where the error/failure detection mechanisms of processes are decoupled from the broadcast algorithm itself, even with the help of the most powerful failure detectors. We circumvent this impossibility by proposing RT-ByzCast, an algorithm based on aggregating digital signatures in a sliding time-window and on empowering processes with self-crashing capabilities to mask and bound losses. We show that RT-ByzCast (i) operates in real-time by proving that messages broadcast by correct processes are delivered within a known bounded delay, and (ii) is reliable by demonstrating that correct processes using our algorithm crash themselves with a negligible probability, even with message loss rates as high as 60%
Qualitative Analysis for Validating IEC 62443-4-2 Requirements in DevSecOps
Validation of conformance to cybersecurity standards for industrial
automation and control systems is an expensive and time consuming process which
can delay the time to market. It is therefore crucial to introduce conformance
validation stages into the continuous integration/continuous delivery pipeline
of products. However, designing such conformance validation in an automated
fashion is a highly non-trivial task that requires expert knowledge and depends
upon the available security tools, ease of integration into the DevOps
pipeline, as well as support for IT and OT interfaces and protocols.
This paper addresses the aforementioned problem focusing on the automated
validation of ISA/IEC 62443-4-2 standard component requirements. We present an
extensive qualitative analysis of the standard requirements and the current
tooling landscape to perform validation. Our analysis demonstrates the coverage
established by the currently available tools and sheds light on current gaps to
achieve full automation and coverage. Furthermore, we showcase for every
component requirement where in the CI/CD pipeline stage it is recommended to
test it and the tools to do so
Facing the Safety-Security Gap in RTES: the Challenge of Timeliness
Safety-critical real-time systems, including real-time
cyber-physical and industrial control systems, need not be solely
correct but also timely. Untimely (stale) results may have severe
consequences that could render the control systemâs behaviour
hazardous to the physical world. To ensure predictability and
timeliness, developers follow a rigorous process, which essentially
ensures real-time properties a priori, in all but the most unlikely
combinations of circumstances. However, we have seen the
complexity of both real-time applications, and the environments
they run on, increase. If this is matched with the also increasing
sophistication of attacks mounted to RTES systems, the case for
ensuring both safety and security through aprioristic predictability
loses traction, and presents an opportunity, which we take
in this paper, for discussing current practices of critical realtime
system design. To this end, with a slant on low-level task
scheduling, we first investigate the challenges and opportunities
for anticipating successful attacks on real-time systems. Then,
we propose ways for adapting traditional fault- and intrusiontolerant
mechanisms to tolerate such hazards. We found that
tasks which typically execute as analyzed under accidental faults,
may exhibit fundamentally different behavior when compromised
by malicious attacks, even with interference enforcement in place
Right On Time Distributed Shared Memory
The demand for real-time data storage in distributed control systems (DCSs) is growing. Yet, providing real- time DCS guarantees is challenging, especially when more and more sensor and actuator devices are connected to industrial plants and message loss needs to be taken into account. In this paper, we investigate how to build a shared memory abstraction for DCSs as a first step towards implementing different shared storage systems in a DCS context. We first prove that, in the presence of host crashes and message losses, the necessary guarantees of such an abstraction are impossible to implement using a traditional approach that has no access to the internals of existing DCS services, e.g., a modular approach where algorithms are built on top of existing software blocks like failure detectors. We propose a white-box approach that utilizes messages of existing services in any DCS as the sole means of communication. More precisely, we present TapeWorm, an algorithm that attaches itself to the heartbeat messages of the failure detector component in DCSs. We prove that TapeWorm implements the desired shared memory guarantees for applications running on a DCS. We also analyze the performance of TapeWorm and we showcase ways of adapting TapeWorm to various application needs and workloads
RepuCoin: Your Reputation is Your Power
Existing proof-of-work cryptocurrencies cannot tolerate attackers controlling more than 50% of the networkâs computing power at any time, but assume that such a condition happening is âunlikelyâ. However, recent attack sophistication, e.g., where attackers can rent mining capacity to obtain a majority of computing power temporarily, render this assumption unrealistic.
This paper proposes RepuCoin, the first system to provide guarantees even when more than 50% of the systemâs computing power is temporarily dominated by an attacker. RepuCoin physically limits the rate of voting power growth of the entire system. In particular, RepuCoin defines a minerâs power by its âreputationâ, as a function of its work integrated over the time of the entire blockchain, rather than through instantaneous computing power, which can be obtained relatively quickly and/or temporarily. As an example, after a single year of operation, RepuCoin can tolerate attacks compromising 51% of the networkâs computing resources, even if such power stays maliciously seized for almost a whole year. Moreover, RepuCoin provides better resilience to known attacks, compared to existing proof-of-work systems, while achieving a high throughput of 10000 transactions per second (TPS)
Byzantine Resilient Protocol for the IoT
Wireless sensor networks, often adhering to a single
gateway architecture, constitute the communication backbone
for many modern cyber-physical systems. Consequently, faulttolerance
in CPS becomes a challenging task, especially when
accounting for failures (potentially malicious) that incapacitate
the gateway or disrupt the nodes-gateway communication, not
to mention the energy, timeliness, and security constraints demanded
by CPS domains. This paper aims at ameliorating the
fault-tolerance of WSN based CPS to increase system and data
availability. To this end, we propose a replicated gateway architecture
augmented with energy-efficient real-time Byzantineresilient
data communication protocols. At the sensors level, we
introduce FT-TSTP, a geographic routing protocol capable of
delivering messages in an energy-efficient and timely manner
to multiple gateways, even in the presence of voids caused by
faulty and malicious sensor nodes. At the gateway level, we
propose a multi-gateway synchronization protocol, which we call
ByzCast, that delivers timely correct data to CPS applications,
despite the failure or maliciousness of a number of gateways. We
show, through extensive simulations, that our protocols provide
better system robustness yielding an increased system and data
availability while meeting CPS energy, timeliness, and security
demands
RepuCoin: Your Reputation is Your Power
Existing proof-of-work cryptocurrencies cannot tolerate attackers controlling more than 50% of the networkâs computing power at any time, but assume that such a condition happening is âunlikelyâ. However, recent attack sophistication, e.g., where attackers can rent mining capacity to obtain a majority of computing power temporarily, render this assumption unrealistic. This paper proposes RepuCoin, the first system to provide guarantees even when more than 50% of the systemâs computing power is temporarily dominated by an attacker. RepuCoin physically limits the rate of voting power growth of the entire system. In particular, RepuCoin defines a minerâs power by its âreputationâ, as a function of its work integrated over the time of the entire blockchain, rather than through instantaneous computing power, which can be obtained relatively quickly and/or temporarily. As an example, after a single year of operation, RepuCoin can tolerate attacks compromising 51% of the networkâs computing resources, even if such power stays maliciously seized for almost a whole year. Moreover, RepuCoin provides better resilience to known attacks, compared to existing proof-of-work systems, while achieving a high throughput of 10000 transactions per second (TPS)